Multivariate Welch t-test on distances
نویسنده
چکیده
MOTIVATION Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary to form the (pseudo) F statistic. Moreover, not only Euclidean, but arbitrary distances can be used. This method, however, suffers from loss of power and type I error inflation in the presence of heteroscedasticity and sample size imbalances. RESULTS We develop a solution in the form of a distance-based Welch t-test, [Formula: see text], for two sample potentially unbalanced and heteroscedastic data. We demonstrate empirically the desirable type I error and power characteristics of the new test. We compare the performance of PERMANOVA and [Formula: see text] in reanalysis of two existing microbiome datasets, where the methodology has originated. AVAILABILITY AND IMPLEMENTATION The source code for methods and analysis of this article is available at https://github.com/alekseyenko/Tw2 Further guidance on application of these methods can be obtained from the author. CONTACT [email protected].
منابع مشابه
Unequal group variances in microarray data analyses
MOTIVATION In searching for differentially expressed (DE) genes in microarray data, we often observe a fraction of the genes to have unequal variability between groups. This is not an issue in large samples, where a valid test exists that uses individual variances separately. The problem arises in the small-sample setting, where the approximately valid Welch test lacks sensitivity, while the mo...
متن کاملFinding an unknown number of multivariate outliers
We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests u...
متن کاملComparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice
A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...
متن کاملAn empirical goodness-of-fit test for multivariate distributions
An empirical test is presented by which one may determine whether a specified multivariate probability model is suitable to describe the underlying distribution of a set of observations. This test is based on the premise that, given any probability distribution, the Mahalanobis distances corresponding to data generated from that distribution will likewise follow a distinct distribution that can...
متن کاملThe effect of external attentional focus instructions with different distances on balance and muscular activity amongst children with mental retardation
Objective: The aim of this study was survey effect of distal and proximal external attentional focus instruction on postural sways and muscular electrical activity amongst children with mental retardation. Method: For this purpose, 30 child 10- 12 years old (M= 11.2) was selected from exceptional children schools and divided to three 10 persons groups (distal external attentional focus, proxima...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 32 شماره
صفحات -
تاریخ انتشار 2016